Re: [SQL] optimizing 2-table join w/millions of rows

Поиск

Список

Период

Сортировка

От	Herouth Maoz
Тема	Re: [SQL] optimizing 2-table join w/millions of rows
Дата	23 ноября 1998 г. 08:08:30
Msg-id	l03110700b27f0a10d780@[147.233.159.109] обсуждение исходный текст
Ответ на	Re: [SQL] optimizing 2-table join w/millions of rows (Michael Olivier <molivier@yahoo.com>)
Список	pgsql-sql

Дерево обсуждения

At 5:05 +0200 on 23/11/98, Michael Olivier wrote:

>
> > How about:
> >
> > SELECT U.acctname
> > FROM   usertest U, bgndtest B
> > WHERE  B.acctname = U.acctname
> >   AND  B.part_needed=3 AND B.loc_needed=5
> >   AND  U.part=2 AND U.loc=3;
>
> Yes, that looks equivalent. My problem is this is too slow an
> operation as I'm benchmarking it right now. And if I add less-than or
> greater-than comparisons, the performance goes _way_ down from there.
> How can I get the best performance out of this kind of operation?
>
> Is there any way to force postgres to hold certain tables in memory
> all the time? As I said, cost of memory isn't an issue, but
> performance is.

This is very strange. As a general rule, a simple join should always be
faster than a query with a subquery. Especially when that subquery is "in"
and not "exists".

In principle, if there is an index on the part, loc and acctname (separate
indices) in usertest, and on part_needed, loc_needed and acctname in
bgndtest, the most optimal query should be: restrict each of the tables by
the literal comparisons, and then match the acctname.

I don't know who wrote the optimiser, and I'm not sure he is on this list.
There are optimizer settings which can be changed, such as SET R_PLANS ON.

In any case, if you see in "real life" that a subquery actually works
better for you, then avoid the join. No reason to do both:

SELECT acctname
FROM bgndtest
WHERE part_needed=3 and loc_needed=5
  AND acctname IN (
      SELECT acctname
      FROM usertest
      WHERE part=2 AND loc=3
  );

Or better yet:

SELECT acctname
FROM bgndtest B
WHERE part_needed=3 and loc_needed=5
  AND EXISTS (
      SELECT *
      FROM usertest U
      WHERE part=2 AND loc=3
        AND U.acctname = B.acctname
  );

Of course, you may do it the other way around. It depends on which of your
tables is supposed to have more records, and which of the comparisons is
actually a "=" comparison, which is very important, because less-than and
greater-than don't use the indices.

What I mean by "the other way around" is:

SELECT acctname
FROM usertest U
WHERE part=2 and loc=3
  AND EXISTS (
      SELECT *
      FROM bgndtest B
      WHERE part_needed=3 and loc_needed=5
        AND U.acctname = B.acctname
  );

As for your question - there is no way to keep an entire table in memory,
unless you have a ram disk.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma

В списке pgsql-sql по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [SQL] optimizing 2-table join w/millions of rows